home *** CD-ROM | disk | FTP | other *** search
-
-
-
- Uniform Resource Locators Tim Berners-Lee
- INTERNET DRAFT CERN
- IETF URL Working Group 30 March 1993
- Expires: September 30, 1993
-
-
- Uniform Resource Locators
-
-
- Status of this memo
-
- This document is an Internet Draft. Internet Drafts are
- working documents of the Internet Engineering Task Force
- (IETF), its Areas, and its Working Groups. Note that other
- groups may also distribute working documents as Internet
- Drafts.
- Internet Drafts are working documents valid for a maximum of
- six months. Internet Drafts may be updated, replaced, or
- obsoleted by other documents at any time. It is not
- appropriate to use Internet Drafts as reference material or to
- cite them other than as a "working draft" or "work in
- progress".
- Distribution of this document is unlimited. Please send
- comments to the author as timbl@info.cern.ch. or to the
- discussion list ietf-url@merit.edu.
-
- Abstract
-
- Many protocols and systems for document search and retrieval
- are currently in use, and many more protocols or refinements
- of existing protocols are to be expected in a field whose
- expansion is explosive.
- These systems are aiming to achieve global search and
- readership of documents across differing computing platforms,
- and despite a plethora of protocols and data formats. As
- protocols evolve, gateways can allow global access to remain
- possible. As data formats evolve, format conversion programs
- can preserve global access. There is one area, however, in
- which it is impractical to make conversions, and that is in
- the names used to identify objects. This is because names of
- objects are passed on in so many ways, from the backs of
- envelopes to hypertext objects, and may have a long life.
- This paper discusses the requirements on a universal naming
- syntax which can be used to refer to objects available using
- existing protocols, and may be extended with technology. It
- makes a recommendation for a generic syntax, and for specific
- forms for "Uniform Resource Locators" (URLs) of objects
- accessible using existing Internet protocols.
-
- Uniform Resource Locators Berners-Lee
-
-
-
- Terms
-
- The objects on the network which are to be named include
- typically objects which can be retrieved, and objects which
- can be searched. There is a great variety of other objects
- which may support other operations. We imply nothing about
- the contents of objects in this document. Whereas human-
- readable documents are currently the center of interest of the
- field, we envisage all aspects discussed in this paper
- applying to generalised objects when systems to handle them
- become available. The "object" is the unit of reference and
- need not correspond to any unit of storage. We refer to
- objects which can be searched as "indexes". We emphasise that
- this is the abstract view of the client, and these objects
- need not correspond to physical files on computers. We refer
- to the person who does the retrieval or searching as the user.
- We use the terms "name" for a string of characters
- describing a document, which allows it (and it alone) to be
- found. The term "address" is reserved for an string which
- specifies a more or less physical location. The term
- "locator" refers to a URL as here defined.
-
- Requirements
-
- This section discusses requirements for URLs, as an
- introduction of and background for the Recommendations
- section.
-
- Uses of names and addresses
-
- A name allows a user, with the help of a "client" program,
- to retrieve or operate on objects via a "server" program. A
- name may be passed for example:
- - In communication of any form between two people, to
- refer to a document, or part of a document;
- - As part of the description of a link associated with
- a hypertext document;
- - As part of the result of searching an index.
- Some typical requirements on a name which are met to a
- varying degree by various schemes are for example that the
- name is
- Persistent A given name will remain valid as long as it
- is needed;
- Extensible A given naming syntax will remain valid
- through the introduction of new protocols and
- directory technologies;
- Resolvable A name will contain enough information to
- allow the document or index to which it
- refers to be accessed, perhaps via resolution
- into an intermediate, more physical, name.
-
-
-
- Internet Draft 2 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- Unique The fact that two names are identical implies
- that the objects named are the same in some
- way
- The syntax discussed is the syntax of one name, be it a
- lasting name or a physical address. When a directory server
- or hypertext link contains a set of alternative names, then
- that is beyond the scope of this syntax. Similarly, a syntax
- for describing a compound object is outside the scope of this
- syntax. The specific locator name spaces (defined under the
- umbrella of the general syntax) each meet the requirements
- above to a greater or lesser extent.
-
- Current practice
-
- Current protocols use many different standards for names.
- For some protocols, such as ISO-10163 Search and Retrieve
- protocol[16], the names returned in a search are only valid
- during the session. For others, such as FTP[9], they are
- lasting names which may be used for object retrieval at a
- later time. Typically, however, they are not long-lasting
- names which are independent of the location of the object.
- Such names may be provided using directory servers such as
- x.500. They will refer to the registration, however formal or
- informal, of a object with a particular organisation or
- person. Both hypertext and manual references rely on long-
- lasting names.
- Current names are basically location specifiers (addresses).
- These may be known as Uniform Resource Locators (URLs). They
- give the necessary parts of an address for a reader to access
- an information provider using the given protocol, and ask for
- the object required. Examples of names used by various
- protocols include
-
- File Transfer Protocol (Postel 1985):
- Host name or IP-address
- [IP port]
- [user name, password]
- Filename
-
- W.A.I.S. (Kahle 1990) Host name or IP-address
- [IP port]
- database name
- local document id
-
- Gopher (Alberti 1991) Host name or IP-address
- [IP port]
- database name
- selector string
-
- HTTP (Berners-Lee 1991) Host name or IP-address
- [IP port]
- local object id
-
-
- Internet Draft 3 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
-
- NNTP (Kantor 1986) group Group name
-
- NNTP article Host name
- unique message identifier
-
- x.500 distinguished name Country
- Organisation
- Organisational unit
- Person
- Local object identifier
-
-
- Other systems with their own naming schemes include BITNET
- "LISTSERV" application, FTAM file retrieval, SQLnetTM remote
- database search, proprietary distributed file systems, etc.
- Conventional syntax for writing these addresses involve
- various forms of punctuation to separate these parts. This
- sometimes, but not always, allows the naming scheme to be
- deduced from the punctuation. For example, a name of the form
- xxx.yyy.zz.edu:/pub.aa.bb.cc often implies anonymous FTP
- access. However, there is no well-defined algorithm for
- parsing an arbitrary name, as there is no common syntax.
-
-
- Expandability
-
-
- There will necessarily be a phase during which lasting names
- will become more common, as the deployment of directory
- services increases to the point where every user has direct
- or indirect access to one. Even then, however, one can
- envisage more than one competing directory system, and cases
- in which physical names are still required. A directory
- service takes a lasting name and reduces it to a physical
- address (or set of addresses) which, though less useful for
- lasting reference, is the only way to actually retrieve the
- object.
-
- An addressing syntax is required which will be able to
- encompass existing physical address spaces, and be extendible
- to any future protocols. This requires that it contain an
- identifier for the protocol in use. The format of the rest of
- the address will necessarily depend to a certain extent on the
- protocol.
-
-
- Relevance
-
- The life of a name is limited by any information contained
- within it which may become prematurely invalid. It is
- therefore necessary to limit the contents of a name to the
-
-
- Internet Draft 4 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- information required for the operations above. Other
- extraneous information about the object (its size, data
- format, authorisation details, etc.) may in general change
- with time and should not be part of the name.
- One might expect such information to be part of the "header"
- of a object, and for protocols to allow the header information
- to be retrieved independently of the objects themselves.
- Any physical address may be subject to change with time:
- hence we encourage the move to lasting names and directory
- services.
-
- Uniqueness
-
- Clearly one requires uniqueness in the sense that one name
- should refer to only one logical object. This is the case with
- all the addressing schemes in use, whether they are directory
- systems or physical addresses. (The internet addresses all
- rely on the domain name (Mockapetris 1987) of the host to
- achieve this).
- However, given that names can be translated, many apparently
- different names may lead to the same object. Any object may
- therefore be referred to by many names. One needs to be able
- to know whether two objects, retrieved through different
- paths, are in fact the same object.
- It is suggested that each object have one "official" name.
- This name could be stored in the object in some
- representations, or stored in a database accessible to the
- server, for example. Any references within that object
- should be parsed in the context of the official name. In the
- presence of a directory service, the official name will
- normally be the registered name of the object. However, a name
- in any scheme will do, so long as it is completely specified.
- On systems which do not allow the name to be stored (such as
- anonymous FTP archive sites), a possible ambiguity will always
- exist as to whether two similarly named objects are in fact
- the same.
- Note that Internet newsgroup names are unique world-wide,
- and news articles carry a unique message id.
- In most other cases, however, there is no guarantee that
- dereferencing a URL will work, or that if it does the object
- it refers to will in fact be the object intended. URLs such
- as FTP addresses are transient in that files may be moved and
- even replaced by different files of the same name. This
- disorganisation may be limited by good server management, but
- a naming scheme which is independent also of internet host
- name is obviously preferable.
-
- Readability by people
-
- This requirement has been put forward by several people
- (Clifford Lynch, Douglas Engelbart among others), and disputed
- by others. The author's view is that it will be a while
-
-
- Internet Draft 5 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- before technology and standardisation have reached the point
- at which names and addresses will be hidden from human beings.
- As long as they must be written on the backs of envelopes and
- "cut and pasted" between workstation windows, there is a
- strong need for names to be
- . Short
- . Composed of printable (preferably non-white)
- characters
- . To a certain extent, understadable by a human being.
-
-
- Structure of names and addresses.
-
-
- A physical address is required in order for
-
- . The user's program to contact the server
- . The server to search and index, retrieve a object,
- or look up the name;
- . The user's program to locate an individual position
- or element within a object.
-
- This suggests that a name be structured, such that the parts
- necessary for these three operations be separate and only
- used by those system elements which need those parts. This
- corresponds to the basic principle of information hiding. In
- fact, four parts are necessary, including the indicator of
- the naming scheme to be used:
-
- . The naming scheme: a registered identifier for the
- protocol.
- . The name of a suitable server. The format of this
- part must be well defined. It will depend on the
- lower-layer protocols in use. Systems which use
- widely distributed information, such as x.500 and
- NNTP, do not need this part as each client generally
- contacts his nearest server (or a particular
- server).
- . Information to be passed to the server. This may be
- private to the server, as all names may be generated
- and used by the same server. The client should
- normally be transparent to this part of the name.
- . Information to be used by the application once the
- object has been retrieved. This part is private to
- the application (or, more strictly, the data format)
- and so cannot be defined here.
-
- Both lasting names and physical addresses often share a
- hierarchical structure. This follows often from the
- organisation of the system. From the naming point of view, it
- has the advantage that a reference in one object to another
-
-
-
- Internet Draft 6 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- object need not include that part of the structure which is
- common to both names.
-
- Choices
-
- The requirements above leave little room for choice save for
- the order and punctuation of the elements of an address. It
- is only reasonable for the order of writing of the parts to be
- consistently from left to right (or right to left) with
- increasing specificity. Punctuation schemes fall into two
- categories (Huitema 1991): tagged schemes in which field are
- given names, and fields which use special characters and field
- order. The latter tend to be more compact schemes.
-
- protocol: aftp host: xxx.yyy.edu path:
- /pub/doc/README
-
- PR=aftp; H=xx.yy.edu; PA=/pub/doc/README;
-
- PR:aftp/xx.yy.edu/pub/doc/README
-
- /aftp/xx.yy.edu/pub/doc/README
-
- Fig 1. Some alternative tagged and untagged representations
-
-
- The choice of special symbols for punctuation tends to be a
- matter of taste. It is easier to read addresses whose symbols
- correspond to those of one's favourite operating system. A
- variety of symbols is needed so that when a name is
- abbreviated it is possible to tell which parts have been
- omitted. The recommendation below uses special characters in
- order to achieve a compact name, and uses where possible
- punctuation symbols established in the internet or unix
- community.
- The choice of escape character for introducing
- representations of non-allowed characters also tends to be a
- matter of taste. An ANSI standard exists in the C language,
- using the back-slash character "\". The use of this character
- on unix command lines, however, can be a problem as it is
- interpreted by many shell programs, and would have itself to
- be escaped.
- The use of white space characters has been avoided in URLs:
- spaces are not legal characters. This was done because of
- the frequent introduction of extraneous white space when lines
- are wrapped by systems such as mail, or sheer necessity of
- narrow column width, and because of the inter-conversion of
- various forms of white space which occurs during character
- code conversion and the transfer of text between applications.
-
-
-
-
-
- Internet Draft 7 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
-
- Recommendations
-
- This section describes the syntax for "Uniform Resource
- Locators" (URLs): that is, basically physical addresses of
- objects which are retrievable using protocols already deployed
- on the net. The generic syntax provides a framework for new
- schemes for names to be resolved using as yet undefined
- protocols.
- The syntax is described in two parts. Firstly, the syntax
- rules of a completely specified name are given; secondly, the
- rules under which parts of the name may be omitted in a well-
- defined context.
-
- Full form
-
-
- A complete URL consists of a naming scheme specifier
- followed by a string whose format is a function of the naming
- scheme. For locators of information on the internet, a common
- syntax is used for the IP address part. A BNF description of
- the URL syntax is given in an a later section. The components
- are as follows.
-
- Anchor-id
-
- This represents a part of, fragment of, or a sub-function
- within, an object or object. Its syntax and semantics are
- defined by the application responsible for the object, or the
- specification of the content type of the object. The only
- definition here is of the allowed characters by which it may
- be represented in a URL.
- The anchor-id follows the URL of the whole object from which
- it is separated by a hash sign (#). If the anchor-id is void,
- the hash sign may be omitted: A void anchor-id with or without
- the hash sign means that the URL refers to the whole object.
- While this hook is allowed for identification of fragments,
- the question of addressing of parts of objects, or of the
- grouping of objects and relationship between contined and
- containing objects, is not addressed by this object.
- This object does not address the question of objects which
- are different versions of a "living" object, nor of expressing
- the relationships between different versions and the living
- object.
-
- Scheme
-
-
- Within the URL of a object, the first element is the name of
- the scheme, separated from the rest of the object by a colon.
- The rest of the URL follows the colon in a format depending on
- the scheme.
-
-
- Internet Draft 8 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
-
-
- Internet protocol parts
-
-
- Those schemes which refer to internet protocols have a
- common syntax for the rest of the object name. This starts
- with a double slash "//" to indicate its presence, and
- continues until the following slash "/". Within that section
- are
-
- . An optional user name, if this must be quoted to the
- server, followed by a commercial at sign "@". (Use
- of this field is discouraged. Provision of encoding
- a password after the user name, delimited by a
- colon, could be made but obviously is only useful
- when the password is public, in which case it
- should not be necessary, so that is also
- discouraged.)
- . The internet domain name of the host in RFC1037
- format (or, optionally and less advisably, the IP
- address as a set of four decimal digits)
- . The port number, if it is not the default number for
- the protocol, is given in decimal notation after a
- colon.
-
-
- Path
-
- The rest of the locator is known as the "path". It may
- define details of how the client should communicate with the
- server, including information to be passed transparently to
- the server without any processing by the client.
- The path is interpreted in a manner dependent on the
- protocol being used. However, when it contains slashes, these
- must imply a hierarchical structure.
-
- Partial form
-
- In a certain limited set of cases, generally within a
- certain application, it may be useful to pass only a section
- of the URL. Within a object whose URL is well defined, the URL
- of another object may be given in abbreviated form, where
- parts of the two URLs are the same. This allows objects within
- a group to refer to each other without requiring the space for
- a complete reference, and it incidentally allows the group of
- objects to be moved without changing any references. This is
- not discussed in detail here, it is only mentioned so that the
- characters required by the technique be reserved for that
- purpose. It must be emphasised that when a reference is
- passed in anything other than a well controlled context, the
- full form must always be used.
-
-
- Internet Draft 9 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- The partial form relies on a property of the URL syntax that
- certain characters ("/") and certain path elements ("..", ".")
- have a significance reserved for representing a hierarchical
- space, and must be recognised as such by both clients and
- servers.
- A partial form can be distinguished from a full form in that
- a full form must have a colon and that colon must occur before
- any slash characters.
- The rules for the use of a partial name are:
- . If the scheme parts are different, the whole
- absolute locator must be given. Otherwise, the
- scheme is omitted, and:
- . If the host and/or port parts are the different, the
- host, port name and all the rest of the locator must
- be given.
- . If the access and host parts are the same, then the
- path may be given in absolute (fully qualified) or
- relative form. Within the path:
- . If a leading slash is present, the path is absolute.
- Otherwise, a relative path is interpreted as
- follows:
- . The last part of the path of the context locator
- (anything following the rightmost slash) is removed,
- and the given partial URL appended in its place.
- . Within the result, all occurrences of "/xxx/.." or
- "/." are recursively removed, where xxx, ".." and
- "." are complete path elements.
-
-
- Mapping Local Names
-
- When a system uses a local addressing scheme, it is useful
- to provide a mapping from local addresses into URLs so that
- references to objects within the addressing scheme may be
- referred to globally, and possibly accessed through gateway
- servers.
- Any mapping scheme may be defined provided it is
- unambiguous, reversible, and provides valid URLs. It is
- recommended that where hierarchical aspects to the local
- naming scheme exist, they be mapped onto the hierarchical URL
- path syntax in order to allow the partial form to be used.
- The following escaping method is used for mapping WAIS, FTP
- and Gopher addresses onto URLs. Where the local naming scheme
- uses ASCII characters which are not allowed in the URL, these
- may be represented in the URL by a percent sign "%" followed
- by two hexadecimal digits (0-9, A-F) giving the ASCII value
- for that character. If non-ASCII characters are used, then a
- similar escaping system should be used. Character codes other
- than those allowed by the syntax shall not be used in a URL.
- The same considerations apply to mapping local anchor
- identifiers onto the anchorid part of a URL.
-
-
-
- Internet Draft 10 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
-
- Specific Naming Schemes
-
- The mapping for some existing standard and experimental
- protocols is outlined in the BNF syntax definition. Notes on
- particular protocols follow.
-
- HTTP
-
- The HTTP protocol specifies that the path is handled
- transparently by those who handle URLs, except for the servers
- which dereference them. The path is passed by the client to
- the server with any request, but is not otherwise understood
- by the client. The anchorid part is not sent with the
- request. The search part, if present, is sent.
-
- FTP
-
- The ftp: prefix indicates a file which is to be picked up
- from the file system of the given host. The FTP protocol is
- used. The port number if given gives the port of the FTP
- server if not the FTP default. (A client may in practice use
- local file access to retrieve objects which are available
- though more efficient means such as local file open or NFS
- mounting, where this is available and equivalent)
- The syntax allows for the inclusion of a user name and even
- a password for those systems which do not use the anonymous
- FTP convention. The default, however, if no user or password
- is supplied, will be to use that convention, viz. that the
- user name is "anonymous" and the password the user's mail
- address.
- The adoption of a unix-style syntax involves the conversion
- into non-unix local forms by either the client or server. Some
- non-unix servers do this, but clients wishing to access sites
- which do not have unix-style naming will need certain
- algorithms to enable other file systems to be identified and
- treated. Client software may also have to be flexible in
- terms of the sequence of FTP commands used with different
- varieties of server. In view of a tendency for file systems
- to look increasingly similar, it was felt that the URL
- convention should not be weighed down by extra mechanisms for
- identifying these cases.
- The data format of a file can only, in the general FTP case,
- be deduced from the name, normally the suffix of the name.
- This is not standardised. The transfer mode (binary or text)
- must in turn be deduced from the data format. It is
- recommended that conventions for suffixes of public archives
- be established, but it outside the scope of this paper.
-
- Andrew File System
-
- The afs: prefix indicates a following afs cellname and path.
-
-
- Internet Draft 11 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
-
- News
-
- The news locators refer to either news group names or
- article message identifiers which must conform to the rules of
- RFC 850. A message identifier may be distinguished from a
- news group name by the presence of the commercial at "@"
- character. These rules imply that within an article, a
- reference to a news group or to another article will be a
- valid URL (in the partial form).
- Note: An outstanding problem is that the message identifier
- is insufficient to allow the retrieval of an expired article,
- as no algorithm exists for deriving an archive site and
- filename. The addition of the date and news group set to the
- article's URL would allow this if a directory existed of
- archive sites by news group. Suggested subject of study in
- conjunction with NNTP WG. Further extension possible may be
- to allow the naming of subject threads as addressable objects.
-
- WAIS
-
- The current WAIS implementation public domain requires that
- a client know the "type" and length of a object prior to
- retrieval. These values are returned along with the internal
- object identifier in the search response. They have been
- encoded into the path part of the URL in order to make the URL
- sufficient for the retrieval of the object. If changes to
- WAIS specs make the internal id something which is sufficient
- for later retrieval then this will not be necessary.
- Within the WAIS world, names do not of course not need to be
- prefixed by "wais:" (by the partial form rules).
-
- Prospero
-
- The Prospero (Neuman, 1991) UDP-based virtual file system
- protocol is used. The host and port parts are used, and
- optional. The significance of the path part may be the name
- of a file, or anything else according to the server. If the
- path ends with a final slash "/" that indicates to the client
- that the object is a directory to be listed.. Prospero links
- of the form EXTERNAL are converted into URLs of non-prospero
- naming schemes (such as "ftp:").
- The path may, as well as the Prospero "Host Specific Object
- Name" (HSOName) have other following Prospero fields,
- currently version and URN. These are appended to the HSOName
- and separated by the characters "%23" (percent, two, three),
- this being the escaped form of the Prospero hash sign
- delimiter.
-
-
-
-
-
-
- Internet Draft 12 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
-
- Gopher
-
- The first character of the URL path part (after the initial
- single slash) is a single-character "type" field which is that
- used by the Gopher protocol. The rest of the path is the
- "selector string", with unprintable characters and spaces
- encoded.
- Gopher links which refer to different protocols may be
- converted into URLs for those protocols.
-
- Telnet, rlogin, tn3270
-
- The use of URLs to represent interactive sessions is a
- convenient extension to their uses for objects. This allows
- access to information systems which only provide an
- interactive service, and no information server. As
- information within the service cannot be addressed
- individually or, in general, automatically retrieved, this is
- a less desirable, though currently common, solution.
-
- x500
-
- The mapping of x500 names onto URLs is not defined here. A
- decision is required as to whether "distinguished names" or
- "user friendly names" (ufn), or both, should be allowed. If
- any punctuation conversions are needed from the adopted x500
- representation (such as the use of slashes between parts of a
- ufn) they must be defined. This is a subject for study.
-
- WHOIS
-
- This prefix describes the access using the "whois++" scheme
- in the process of definition. The hostname part is the same as
- for other IP based schemes. The path part can be either a
- whois handle for a whosi object, or it can be a valid whois
- query string. This is a subject for further study.
-
- Network Management Database
-
- This is a subject for study.
-
-
- Registration of naming schemes
-
- A new naming scheme may be introduced by defining a mapping
- onto a conforming URL syntax, using a new scheme identifier.
- Experimental scheme identifiers may be used by mutual
- agreement between parties, and must start with the characters
- "x-".
- It is proposed that the Internet Assigned Numbers Authority
- (IANA) perform the function of registration of new schemes.
-
-
- Internet Draft 13 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- Any submission of a new scheme must include a definition of an
- algorithm for the retrieval of any object within that scheme.
- The algorithm must take the URL and produce either a set of
- URL(s) which will lead to the desired object, or the object
- itself, in a well-defined or determinable format. It is
- recommended that those proposing a new scheme demonstrate its
- utility and operability by the provision of a gateway which
- will provide images of objects in the new scheme for clients
- using an existing protocol.
- It is likewise recommended that, where a protocol allows for
- retrieval by URL, that the client software have provision for
- being configured to use specific gateway locators for indirect
- access through new naming schemes.
-
- BNF syntax
-
- This is a BNF-like description of the Uniform Resource
- Locator syntax. A vertical line "|" indicates alternatives,
- and [brackets] indicate optional parts. Spaces are
- representational only: no spaces are actually allowed within a
- URL. Single letters stand for single letters. All words of
- more than one letter below are entities described somewhere in
- this description.
-
- anchoraddress docaddress [ # anchorid ]
- docaddress generic | httpaddress | fileaddress |
- newsaddress | prosperoaddress | telnetaddress
- | gopheraddress | waisaddress | afsaddress
- generic scheme : path [ ? search ]
- scheme ialpha
- httpaddress h t t p : / / hostport [ / path ] [ ?
- search ]
- fileaddress f t p : / / host / path
- afsaddress a f s : / / cellname / path
- newsaddress n e w s : groupart
- waisaddress waisindex | waisdoc
- waisindex w a i s : / / hostport / database [ ? search
- ]
- waisdoc w a i s : / / hostport / database / wtype /
- digits / path
- groupart * | group | article
- group ialpha [ . group ]
- article xalphas @ host
- database xalphas
- wtype xalphas
- prosperoaddress p r o s p e r o : / / path
- telnetaddress t e l n e t : / / [ user @ ] hostport
- gopheraddress g o p h e r : / / hostport [/ gtype [ /
- selector ] ] [ ? search ]
- hostport host [ : port ]
- host hostname | hostnumber
- cellname hostname
-
-
- Internet Draft 14 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- hostname ialpha [ . hostname ]
- hostnumber digits . digits . digits . digits
- port digits
- selector path
- path void | xpalphas [ / path ]
- search xalphas [ + search ]
- user xalphas
- anchorid xalphas
- gtype xalpha
- xalpha alpha | $ | - | _ | @ | ! | % | ^ | & | * |
- ( | ) | . | digit
- xalphas xalpha [ xalphas ]
- xpalpha xalpha | +
- xpalphas xpalpha [ xpalpha ]
- ialpha alpha [ xalphas ]
- alpha a | b | c | d | e | f | g | h | i | j | k | l
- | m | n | o | p | q | r | s | t | u | v | w
- | x | y | z | A | B | C | D | E | F | G | H
- | I | J | K | L | M | N | O | P | Q | R | S
- | T | U | V | W | X | Y | Z
- digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
- digits digit [ digits ]
- alphanum alpha | digit
- alphanums alphanum [ alphanums ]
- void
-
-
-
- Security considerations
-
- The URL scheme does not in itself pose a security threat.
- Users should beware that there is no general guarantee that a
- URL which at one time points to a given object continues to do
- so, and does not even at some later time point to a different
- object due to the movement of objects on servers.
-
- Conclusion
-
- A need has been demonstrated, and a number of requirements
- have been stated for uniform resource locators (URLs). A
- scheme has been proposed which builds on existing conventions
- to define a syntax for URLs. Adoption of the scheme in
- correspondence, standards and software will ease the use of
- references to on-line information in a flexible way as the
- coming information age arrives.
-
- Acknowledgements
-
- This paper builds on much discussion of these issues by many
- people on the network. The discussion was particularly
- stimulated by articles by Clifford Lynch (1991), Brewster
- Kahle (1991) and Wengyik Yeong (1991b). Contributions from
-
-
- Internet Draft 15 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- John Curran (NEARnet), Clifford Neuman (ISI) Ed Vielmetti
- (MSEN) and later the IETF URL working group have been
- incorporated into this issue of this paper.
- The draft url4 was generated from url3 following discussion
- and overall approval of the URL working group on 29 March
- 1993. The paper url3 had been generated from udi2 in the light
- of discussion at the UDI BOF meeting at the Boston IETF in
- July 1992.
-
- References
-
- Alberti, R., et.al. (1991) "Notes on the Internet Gopher
- Protocol" University of Minnesota, December 1991,
- URL=file://boombox.micro.umn.edu/pub/gopher/gopher_protocol
- . See also
- URL=gopher://gopher.micro.umn.edu:70/00/Information%20About
- %20Gopher/About%20Gopher
- Berners-Lee, T., (1991) "HTTP as implemented in WWW", CERN,
- December 1991,
- URL=file://info.cern.ch./pub/www/doc/http.txt
- Davis, F, et al., (1990) "WAIS Interface Protocol: Prototype
- Functional Specification", Thinking Machines Corporation,
- April 23, 1990
- URL=file://quake.think.com/pub/wais/doc/protspec.txt
- International Standards Organization, (1991) Information and
- Documentation - Search and Retrieve Application Protocol
- Specification for open Systems Interconnection, ISO-10163
- Huitema, C., (1991) "Naming: strategies and techniques",
- Computer Networks and ISDN Systems 23 (1991) 107-110.
- Kahle, Brewster, (1991) "Document Identifiers, or
- International Standard Book Numbers for the Electronic
- Age", URL=file://quake.think.com/pub/wais/doc/doc-ids.txt
- Kantor, B., and Lapsley, P., (1986) "A proposed standard for
- the stream-based transmission of news", Internet RFC-977,
- February 1986. URL=file://nnsc.nsf.net/rfc/rfc977.txt
- Lynch, C., Coallition for Networked Information: (1991)
- "Workshop on ID and Reference Structures for Networked
- Information", November 1991. See
- URL=wais://quake.think.com/wais-discussion-archives?lynch
- Mockapetris, P., (1987) "Domain names + concepts and
- facilities", RFC-1034, USC-ISI, November 1987,
- URL=file://nnsc.nsf.net/rfc/rfc1034.txt
- Neuman, B. Clifford, (1992) "Prospero: A Tool for Organizing
- Internet Resources", Electronic Networking: Research,
- Applications and Policy, Vol 1 No 2, Meckler Westport CT
- USA. See also
- URL=file://prospero.isi.edu/pub/prospero/oir.ps
- Postel, J. and Reynolds, J. (1985) "File Transfer Protocol
- (FTP)", Internet RFC-959, October 1985.
- URL=file://nnsc.nsf.net/rfc/rfc959.txt
- Yeong, W., (1991a) "Towards Networked Information Retrieval",
- Technical report 91-06-25-01, June 1991, Performance
-
-
- Internet Draft 16 March 1993
-
- Uniform Resource Locators Berners-Lee
-
-
- Systems International, Inc.
- URL=file://uu.psi.com/wp/nir.txt
- Yeong, W., (1991b), "Representing Public Archives in the
- Directory", Internet Draft, November 1991. In
- URL=wais://nnsc.nsf.net/internet-drafts?yeong
-
-
- Author's address
-
-
- Tim Berners-Lee
- World-Wide Web project
- CERN, 1211 Geneva 23, Switzerland
- +41 (22)767 3755
- timbl@info.cern.ch
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Internet Draft 17 March 1993
-
-